食源性疾病是一个严重但可以预防的公共卫生问题 - 延迟发现相关的暴发导致生产力损失,昂贵的召回,公共安全危害甚至生命丧失。尽管社交媒体是识别未报告的食源性疾病的有前途的来源,但缺乏标记的数据集来开发有效的爆发检测模型。为了加快基于机器学习的疫苗爆发检测模型的开发,我们提出了推文-FID(Tweet-Foodborne疾病检测),这是第一个用于多种食源性疾病事件检测任务的公开注释的数据集。从Twitter收集的Tweet-FID带有三个方面:Tweet类,实体类型和老虎机类型,并带有专家以及众包工人生产的标签。我们介绍了利用这三个方面的几个域任务:文本相关性分类(TRC),实体提及检测(EMD)和插槽填充(SF)。我们描述了用于支持这些任务模型开发的数据集设计,创建和标签的端到端方法。提供了这些任务的全面结果,以利用Tweet-FID数据集上的最新单项和多任务深度学习方法。该数据集为未来的Foodborne爆发检测提供了机会。
translated by 谷歌翻译
学习辐射场对新型视图综合显示出了显着的结果。学习过程通常会花费大量时间,这激发了最新方法,通过没有神经网络或使用更有效的数据结构来通过学习来加快学习过程。但是,这些专门设计的方法不适用于大多数基于辐射的方法的方法。为了解决此问题,我们引入了一项一般策略,以加快几乎所有基于辐射的方法的学习过程。我们的关键想法是通过在多视图卷渲染过程中拍摄较少的射线来减少冗余,这是几乎所有基于辐射的方法的基础。我们发现,在具有巨大色彩变化的像素上的射击不仅显着减轻了训练负担,而且几乎不会影响学到的辐射场的准确性。此外,我们还根据树中每个节点的平均渲染误差将每个视图自适应地细分为Quadtree,这使我们在更复杂的区域中动态射击更多的射线,并具有较大的渲染误差。我们在广泛使用的基准下使用不同的基于辐射的方法评估我们的方法。实验结果表明,我们的方法通过更快的训练获得了与最先进的可比精度。
translated by 谷歌翻译
联合学习(FL)是一种从分散数据源训练机器学习模型的技术。我们根据当地的隐私约束概念研究FL,该概念通过在离开客户之前使数据混淆,为敏感数据披露提供了强烈的保护。我们确定了设计实用隐私的FL算法的两个主要问题:沟通效率和高维度的兼容性。然后,我们开发一种基于梯度的学习算法,称为\ emph {sqsgd}(选择性量化的随机梯度下降),以解决这两个问题。所提出的算法基于一种新颖的隐私量化方案,该方案使用每个客户每个维度的恒定位数。然后,我们通过三种方式改进基本算法:首先,我们采用梯度亚采样策略,同时在固定隐私预算下提供更好的培训性能和较小的沟通成本。其次,我们利用随机旋转作为预处理步骤来减少量化误差。第三,采用了自适应梯度标准上限策略来提高准确性和稳定训练。最后,在基准数据集中证明了拟议框架的实用性。实验结果表明,SQSGD成功地学习了Lenet和Resnet等局部隐私约束的大型模型。此外,凭借固定的隐私和通信水平,SQSGD的性能显着主导了各种基线算法。
translated by 谷歌翻译
图形卷积网络(GCN)已显示出容易受到小型对抗扰动的影响,这成为严重的威胁,并在很大程度上限制了其在关键安全场景中的应用。为了减轻这种威胁,大量的研究工作已致力于增加GCN对对抗攻击的鲁棒性。但是,当前的防御方法通常是为整个图表而设计的,并考虑了全球性能,在保护重要的本地节点免受更强的对抗性靶向攻击方面面临着挑战。在这项工作中,我们提出了一种简单而有效的方法,名为Graph Universal对抗防御(Guard)。与以前的作品不同,Guard可以保护每个单独的节点免受通用防御贴片的攻击,该节点是一次生成的,可以应用于图中的任何节点(节点-Agnostic)。在四个基准数据集上进行的广泛实验表明,我们的方法可显着提高几种已建立的GCN的鲁棒性,以针对多种对抗性攻击,并且胜过大幅度的最先进的防御方法。我们的代码可在https://github.com/edisonleeeeee/guard上公开获取。
translated by 谷歌翻译
多租户机器学习服务已成为数据中心的数据密集型工作负载,具有GPU资源的繁重。由于大规模,许多调整参数和繁重的资源使用量,评估和基准真实集群的机器学习服务通常是不切实际的。在这次演示中,我们展示了AnalySim,一个集群模拟器,可以为多租户学习服务提供高效的设计探索。具体而言,通过跟踪驱动的群集工作负载仿真,Analysim可以轻松测试和分析许多性能指标中的各种调度策略,例如GPU资源利用率。 Analysim根据物理拓扑和逻辑分区模拟群集计算资源。该工具已用于大致用途,以了解不同调度策略与来自超过1000个GPU的实际生产集群的轨迹的影响。我们发现抢占和迁移能够显着降低平均工作完成时间并减轻资源碎片问题。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译